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Abstract 

Forecast ensembles are typically employed to account for prediction uncertainties 
in numerical weather prediction models. However, ensembles often exhibit biases and 
dispersion errors, thus they require statistical post-processing to improve their predic¬ 
tive performance. Two popular univariate post-processing models are the Bayesian 
model averaging (BMA) and the ensemble model output statistics (EMOS). 

In the last few years increased interest has emerged in developing multivariate 
post-processing models, incorporating dependencies between weather quantities, such 
as for example a bivariate distribution for wind vectors or even a more general setting 
allowing to combine any types of weather variables. 

In line with a recently proposed approach to model temperature and wind speed 
jointly by a bivariate BMA model, this paper introduces a bivariate EMOS model for 
these weather quantities based on a truncated normal distribution. 

The bivariate EMOS model is applied to temperature and wind speed forecasts 
of the eight-member University of Washington mesoscale ensemble and of the eleven- 
member ALADIN-HUNEPS ensemble of the Hungarian Meteorological Service and its 
predictive performance is compared to the performance of the bivariate BMA model 
and a multivariate Gaussian copula approach, post-processing the margins with uni¬ 
variate EMOS. While the predictive skills of the compared methods are similar, the 
bivariate EMOS model requires considerably lower computation times than the bivari¬ 
ate BMA method. 

Key words: Ensemble model output statistics, Gaussian copula, energy score, ensemble 
calibration, Euclidean error, truncated normal distribution. 
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1 Introduction 


Accurate and reliable prediction of future states of the atmosphere is the most important ob¬ 
jective of weather prediction. These forecasts are issued on the basis of observational data and 
numerical weather prediction (NWP) models, which are capable to simulate the atmospheric 
motions taking into account the physical governing laws of the atmosphere and the connected 
spheres (typically sea or land surface). The NWP models consist of sets of partial differential 
equations which have only numerical solutions and strongly depend on initial conditions. In 
order to reduce the uncertainties caused by the possibly unreliable initial conditions and 
the numerical weather prediction process itself, o ne can run the models with various initial 
conditions resulting in a forecast ensemble flLeithl. 197411 . Using a forecast ensemble not only 
the classical point forecasts (ensemble median or mean) can be obtained, but also an esti- 
mate o f th e distr ibution of the future weather variable, which allows probabilistic forecasting 
flGneiting and Raftervl. 120051) . The first operational implementation of the ensemble predic¬ 
tion method dates back to the nineties flBuizza et all 11993 : Toth and Kalnav. 19971) and in 
the last twenty years it became a widely used technique in the meteorological community. 
Recently all major national meteorological services operate their own ensem b le pre diction 
systems (EPSs), see, e.g., the PEARF0EPS of Meteo France ( Descamps et al. 201 li) or the 
COSMO-DeE| EPS of the German Meteorological Service (DWD; Bouallegue et al . 2013 ). 
whereas the most well-known organization issuing ense mble fo recas ts is the European Centre 
for Medium-Range Weather Forecasts (jECMWF Directorate!. 120121). However, as it has been 
observed with several operational EPSs (see, e.g-. IBuizza et al. 2005)), the forecast ensemble 
is usually underdispersive and consequently badly calibrated. One possible improvement 
area of the ensemble forecasts is the statistical post-processing of the ensemble in order to 
transform the original ensemble member-based probability density function (PDF) into a 
more reliable and realistic one. 


From the v a rious post-processing techniques (for an overview see, e.g., iGneitingl 120141: 
Williams et all 12014!) probab l y the most popular approaches are the Bayesian model av¬ 
eraging (BMA; iRafterv et all 120051) and the ensem ble model output statistics (EMOS) or 
non-homogeneous regression flGneiting et al. l2005f) . These meth ods are partia lly imple¬ 


mented in the ensembleBMA and enserableMOS packages of R ( Fralev et al . 1 2 201 lh and both 
approaches provide estimates of the distributions of the predictable weather quantities. 

In the case of the BMA the predictive probability density function (PDF) of a future 
weather quantity is a weighted mixture of individual PDFs corresponding to the members of 
the ensemble, where the weights express the relative performance of the ensemble members 
during a given training period. The BMA models of various weather quantities differ only 
in the PDFs of the mixture comp onents. For temperature and sea level pressure a n ormal 
distribution (' Rafterv et all 12005 ). for wind speed a gamma (Slaughter et al. 2010j) or a 


1 PE ARP: Prevision d’Ensemble ARPege 

2 COSMO: Consortium for Small scale Modeling 
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truncated normal distr ibution (iBaranl . 2 0141 ). whereas for surface wind direction a von Mises 
distribution (iBao et all 120101 ) is suggested. 

The EMOS predictive PDF uses a single parametric distribution with parameters depend¬ 
ing on the ensemble members. EMOS models have alre ady been developed for calibrating 
ensemble forecasts of temperature and sea level pressure flGneiting et al\. 1200511 . wind speed 
( Thorarinsdottir and Gneiting . 2010 : Lerch and ThorarinsdottirT 2013 : Baran and Lerch . 2015 ) 
and precipitation flScheuerer . 20141 ). 

Besides the calibration of univariate weather quantities recently an increasing interest 
has appeared in modelin g corre lations between the different weather variables. In the spe- 
cial case of wind vect ors IPinsonl (120121) suggeste d an adaptive calibration technique, whereas 


Schuhen et all (120121) and lSloughter et all ( 20131) introduced bivariate EMOS and BMA mod 


els, respectively. Further, iMoller et all (hoidh developed a general approach where after 


univariate calibration of the weather variables the component predictive PDFs are joined 
into a multivariate predictive density with the help of a Gaussi a n co pula. Another idea 
appears in the ensemble copula coupling (ECC) method Schefzik et al. (120131 ) where after 
univariate calibration the ran k orde r information in the raw ensemble is used to restore corre¬ 
lations. Finally, Baran and Moller (120151) developed a BMA model for joint post-pro cessing 
of ensemble forecasts of wind speed and temperature. 

In the present paper we introduce an EMOS model for joint calibration of wind speed 
and temperature which is based on a truncated normal distribution with cut-off at zero in its 
first (wind) coordinate. The method is tested on the ensemble forecasts of wind speed and 
tem perature of t he eig ht-member University of Washington Mesoscale Ensemble (UWME; 


Eckel and Massl. 1200511 and of the Limited Area Model EPS of the H ungarian Meteorolog¬ 


ical Service (HMS) called ALADIN-HUNEP^f] ( Horanvi et al . 2011 ). The performance of 
the EMOS model is compare d to th e forecasting skills of the previously invest igated BMA 


metho d of iBaran and Mollerl (120151) and to the Gaussian copula approach of IMoller et al. 


(12013h . where the margins of the multivariate predictive distribution are estimated by EMOS. 


2 Data 


2.1 University of Washington mesoscale ensemble 


The eight-member University of Washington mesoscale ensemble covers the Pacific North¬ 
west region of western North America providing forecasts on a 12 km grid. The ensem¬ 
ble members are obtained from different runs of the fifth generation Pennsylvania State 
University-National Center for Atmospheric R esearc h mes oscale model (PSU-NCAR MM5) 

Our data base (identical 
contains ensembles of 48- 


with initial condit ions from dif f erent s ources (IGrell et al. 


to the one used in IMoller et all (120131) ; IBaran and Mollerl (120151 


3 ALADIN: Aire Limitee Adaptation dynamique Development International 
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hour forecasts and corresponding validation observations of 10 meter maximum wind speed 
(maximum of the hourly i nstantaneous wind speeds over the previous twelve hours, given 
in m/s, see, e.g., ISlonghter et all (1201011 1 and 2 meter minimum temperature (given in K) 
f or 1 52 stations in the Automated Surface Observing Network ((National Weather Service! . 
1998 1 in the US states of Washington, Oregon, Idaho, California and Nevada for calendar 
years 2007 and 2008. The forecasts are initialized at 0 UTC (5 pm local time when daylight 
saving time (DST) is in use and 4 pm otherwise) and the generation of the ensemble implies 
that its members are not exchangeable. In the present study we investigate only forecasts 
for calendar year 2008 with additional data from 2007 used for parameter estimation. After 
removing days and locations with missing data, 90 stations remained where the number of 
days for which forecasts and validating observations are available varies between 141 and 
290. 


2.2 ALADIN-HUNEPS ensemble 


The ALADIN-HUNEPS system of the HMS covers a large part of Continental Europe with 
a horizontal resolution of 8 km and it is obtained by dynamical downscaling (by the AL- 
ADIN limited area model) of the global AR PEGhfjl based PEARP system of Meteo France 
( Horanvi et al . 2006; Descamps et a,l\. 12014th The ensemble consists of 11 members, 10 ini¬ 
tialized from perturbed initial conditions and one control member from the unperturbed 
analysis, implying that the ensemble contains groups of exchangeable forecasts. The data 
base contains 11 member ensembles of 42-hour forecasts for 10 meter instantaneous wind 
speed (given in m/s) and 2 meter temperature (given in K) for 10 major cities in Hun¬ 
gary (Miskolc, Szombathely, Gyor, Budapest, Debrecen, Nyiregyhaza, Nagykanizsa, Pecs, 
Kecskemet, Szeged) produced by the ALADIN-HUNEPS system of the HMS, together with 
the corresponding validating observations for the one-year period between April 1, 2012 and 
March 31, 2013 and for the period from October 1, 2010 to March 25, 2011. The forecasts 
are initialized at 18 UTC (8 pm local time when DST operates and 7 pm otherwise). The 
data sets are fairly complete since there are only six and three days, respectively, when no 
forecasts are available and these days have been excluded from the analysis. 


3 Ensemble Model Output Statistics 


As mentioned in the Introduction, the EMOS predictive PDF of a weather quantity (vector) 
X is a single parametric density function where the parameters depend on the ensemble 
members. For temperature and pressure a normal distribution can be fit reasonably well 


( Gne iti ng et al. I2005lh w hile for wind vectors a bivariate normal distribution can be ap¬ 
plied (jSchuhen et all 2012 ). However, for modeling non-negative quantities such as wind 
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Once the predictive density is given, its mean or median can be taken as a point forecast 
for X. In one dimension the definition of the latter is obvious, whereas for a d -dimensional 
cumulative distribution function (CDF) F a multivariate median is a vector minimizing 
the function 


<K«) 


a — x\\F(dx), 


where 


denotes the Euclidean norm. If F is not concentrated on a line in 


then 


the median is unique (Milasevic and Ducharme, 1987). 


Denote by f L , f 2 , ■ ■ ■, f m the ensemble of distinguishable forecast vectors of wind 
speed and temperature for a given location and time. This means that each ensemble 
member can be identified and tracked, which holds for example for the UWME (see Section 
I2U). However, most of the currently used ensemble prediction systems provide ensembles 
where at least some members are statistically indistinguishable. Such ensemble systems are 
simulating uncertainties by perturbing the initial conditions, and they usually have a control 
member (the one without any perturbation), whereas the remaining ensemble members form 
one or two exchangeab le group s. This is the case, e.g., for the 51 member ECMWF ensemble 
flLeutbecher and Palmerl . 12 0081 ) or for the ALADIN-HUNEPS ensemble described in Section 

m 


In what follows, if we have M ensemble members divided into m exchangeable groups, 
where the fcth group contains M k > 1 ensemble members (^(A, M k = M), notation 
f k e will be used for the 7'th member of the fcth group. 


3.1 Bivariate truncated normal model 


Denote by A/^A/T E) the bivariate normal distribution with location vector /x, scale matrix 
E, and first coordinate truncated from below at zero. Let 


M = 


Hw 

dT 


and 


E = 


&W a WT 
2 

awT &T 


If E is regular, then the PDF of this distribution is 

1 /2 

9( " |M - E) :=exp ('- l (x ~ ~ 


x = 


Xw 

X T 


\ (3-1) 
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where $ denotes the CDF of the standard normal distribution a nd by Ih w e den ote the 


indicator function of a set H. Short calculation shows (see, e.g., Rosenbaum! . 1196111 . that 
the mean vector k and covariance matrix 5 of A/" 2 °(r, S) are 


K, = [l + 


<, p(cw/^W ) 


a w 

cwt/cw 


and 


= E- 


(?w${nw/vw) \$(cw/vw) 


a w 

&WT 


OwT 


a 


WT 


ja- 


respectively, where ip denotes the PDF of the standard normal distribution. 

The proposed EMOS predictive distribution of wind speed and temperature is 

M"(A- !!,/,+■■■ + B M f M , C + DSD t ) 


(3.2) 


with 


S : = 


1 


M 


E(^-/)(a-/) t . 


M - 1 


k =1 


where f denotes the ensemble mean vector. Parameter vector /lei 2 and two-by-two 
real parameter matrices Bi,, Bm and CL D of model (13 .2)1 . where C is assumed to 
be symmetric and non-negative definite, can be estimated from the training data consisting 
of ensemble members and verifying observations from the preceding n days, by optimizing 
with respect to the mean logarithmic score, i. e., the nega tive logarithm of the predictive PDF 
evaluated at the verifying observation (iGneiting et all 2008). We remark that under the 
assumption of independence in space and time, this approach is equivalent to the maximum 
likelihood method. Obviously, the forecast errors are usually not independent, however, 
since one is estimating the conditional distribution of a single weather quantity vector with 
respect to the corresponding f orecasts, the parameter estimates are not really sensitive to 
this assumption (see, e.g.. [Rafter y et al l 120051) . 

If the ensemble can be divided into groups of exchangeable members, ensemble mem- 
bers w ithin a given group will get the same coefficient matrix of the location parameter 


(Fraley et al. 


2010 


Gneitind. 2014) resulting in a predictive distribution of the form 


Mi 


M n 


V 2 °( A + B i £ f lA + --- + B m Y, + DSD t ], 

*1=1 


(3,3) 


.-1 


where again, S denotes the empirical covariance matrix of the ensemble. 


3.2 Verification scores 

To investigate the predictive skills of the pro babilistic and point forecasts we apply the 
multivariate scores proposed by IGn eiting et al. fl2008i h 
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The first step is to check the calibration of probabilistic forecasts, which notion means 
a statistical consistency between the predictive distributions and the observations (see, e.g., 
Thorarinsdottir and Gneitind. l2010fh For one-dimensional ensemble forecasts a frequently 
used tool for this purpose is the verification rank histogram, i.e., the hist ogra m of ranks of 
validating observations with respect to the ensemble forecasts (see, e.g., Wilks, 2011, Section 
8.7.2). The closer the distribution of the ranks to the uniform distribution on {1,2 ,,M + 
1}, the better the calibration. The deviation from uniformity can be quantified by the 
reliability index A defined as 

M+l . 


A := V \p r - 

^ r M + i 

r =1 

where pj is the relative frequency of rank r (belle Monache et al . 2006 ). In the multiy ari- 


ate case the proper definition of ranks is not obvious. Similar to 
the present work we use the multivariate ordering proposed by 

Baran and Mo 

lei 

( 

2015), in 

Gneiting et al. 

(200* 

$). For a 

probabilistic forecast one can calculate the reliability index from a preferably large number 


of ensembles (we use 100) sampled from the predictive PDF and the corresponding verifying 
observations. 

For evaluating multivariate density forecasts the mo st pop ula r scor ing rules are the log¬ 
arithmic score and the energy score (ES), introduced by Gneiting and Rafter yi (120071 ). Both 
the logarithmic and the energy score are proper scoring rules which are negatively oriented, 
that is, the smaller the better, and the latter is a direct multivariate extension of the continu¬ 
ous ranked probability score (CRPS). Given a predictive CDF F on and a d -dimensional 
observation x, the energy score is defined as 

ES(F, x) ■= E\\X - x|| - ^E\\X - X'||, 

where X and X' are independent random vectors with CDF F. However, for the 
bivariate truncated normal distribution the energy score cannot be given in a closed form, 
so it is replaced by a Monte Carlo approximation 


n— 1 


^^ 1 1 
ES (F ,*) := f E II - *11 - WTTT E ll x J - 


n ' " ' 2 (n — 1) 

3 =1 v ' 3 = 1 


(3.4) 


where X\, X 2 , ■ ■ •, X n is a (large, we use n = 10000) random sample from F (Gneiting et al 


20080. Finally, if F is a CDF corresponding to a forecast ensemble f l , f 2 ,..., f M then 
(13.4p reduces to 

1 M ^ MM 

ES (E.*) = 17 E II *11 9 A/f 2 EE -Ail- 


3 = 1 


j =1 k =1 


Besides the proper calibration, probabilistic forecasts should result in sharp predictive 
distributions. In the univariate case this usually means small standard deviations leading 




































to narrow central prediction intervals. For a d-dimensional quantity one can consider the 
determinant sharpness (DS) defined by 


DS := (det(E)) 


1/(2 d ) 


where £ is the covariance matrix of an ensemble or of a predictive PDF. 

Finally, point forecasts (median and mean) can be evaluated using the mean Euclidean 
distance (EE) of forecasts from the corresponding validating observations. For multivariate 
fo recasts the ensemble median can be obtained, e.g. , using the Newton-type algorithm given 


m 


Dennis and Schnabell (119831 ) or the algorithm of IVardi and Zhang (120001) . For a detailed 
comparison of different algorithms, see, e.g., Fritz et al. ( 2012 ). Given a predictive CDF, to 


determine the corresponding median the chosen algorithm might be applied on a preferably 
large sample from this distribution. 


3.3 Parameter estimation 


There are two possible approaches to the choice of training data for est i mating the unknown 


para meters of the various EMOS models (Thorarinsdottir and Gneiting, 2010 


Schuhen et al. 


20121) . The regional EMOS technique uses ensemble forecasts and validating observations 
from a rolling training period for all available stations. In this way, one gets a universal set 
of parameters across the entire ensemble domain, which is then used at all observation sites. 
E.g., in case of the ALADIN-HUNEPS ensemble this means a single set of parameters for all 
ten cities. In contrast, local EMOS produces distinct parameter estimates for the different 
stations by using only the training data of the given station. These training sets contain 
only one observation per day, so local EMOS models require long training periods. 

Now, e.g., in the bivariate model (13. 3 1) the number of free parameters to be estimated 
is 4m + 10, which means 14 parameters even in the simplest case of a single exchangeable 
ensemble group. Hence, for estimating the parameters of models (13.2j) and (13. 3 p only the 
regional EMOS approach is applicable. 

The mean logarithmic score is optimized numerically wit h the help of the optim function 
in R, using principally the Nelder-Mead (INelder and Meadi . 19651) algorithm. This method 
is slower but m ore robust than the popular Broyden-Fletcher-Goldfarb-Shanno (BFGS) al¬ 
gorithm ( Press et all . 12007 . Section 10.9), which in case of a small training set becomes 
unstable. Both optimization methods require initial values, and the starting values of the 
location parameters A and Bi,... ,Bm are coefficients of the bivariate linear regression 
of the observations on the ensemble forecasts over the training period. Further, for the scale 
parameters C and D, the previous day’s estimates can serve as initials values, however, 
according to our experience, fixed starting values provide slightly better results. Finally, to 
enforce the non-negative definiteness of the parameter matrix C one can set C = CC J and 
perform the optimization with respect to C. 
































9 


Verification Rank Histogram, Wind Speed Verification Rank Histogram, Temperature Multivariate Rank Histogram 



Rank of Observation in Ensemble Rank of Observation in Ensemble Rank of Observation in Ensemble 


Figure 1: Verification rank histograms of the 8-member UMWE forecasts of maximum wind 
speed (left) and minimum temperature (center) and the multivariate rank histogram (right). 
Period: January 1, 2008 - December 31, 2008. 


4 Results 


As mentioned in the Introduction, the predictive performance of the bivariate EMOS model 
(see Section 13. lj) is tested on the eight-member UWME and on the ALADIN-HUNEPS 
ensemble of the HMS. The goodness of fit of the predictive distributions is quantified with 
the multivariate scores given in Section 13721 and th e obtained results are compared to th e fits 
of the indepe ndent EMOS models of wind speed (IThorarinsdottir and Gneitind. l2010li and 


temperature (Gneiting et al ., 2005|), the Gaussian copula method proposed by 


Moller etai 


(120131) . but with marginal distributions estimated by EMOS models, and the bivariate BMA 
model of Bar an and Moller (I2015 ). We remark that the parameters of the independent 
univariate EMOS models are estimated by minimizing the mean CRPS of the training data. 
For fitting the marginal predictive distributions in the Gaussian copula approach, we employ 
the same univariate EMOS models for wind speed and temperature as in the independent 
approach. Therefore, their model parameters are estimated by the minimum CRPS method 
as well. If one has a closed expression for the CRPS, which is the case both for the normal and 
the truncated normal distribution, this method usually gives better results than optimization 
with respect to the logarithmic score. 


4.1 University of Washington Mesoscale Ensemble 

4.1.1 Raw ensemble 


Several studies have verified that wind speed and temperature forecasts of the UWME ar e 
strongly underdispersive (see, e.g., IThorarinsdottir and Gneitind. 12010c iFralev et al. |2010|), 
and consequently uncalibrated. Obviously, the lack of calibration will remain valid if one 
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Probabilistic forecasts Median forecasts Mean forecasts 



ES 

A 

DS 

EE 

Q 

Qerr 

EE 

Q 

Qerr 

EMOS 

2.127 

0.025 

2.273 

2.982 

0.165 

0.182 

2.982 

0.157 

0.182 

Indep. EMOS 

2.118 

0.059 

2.206 

2.966 

0.164 

0.176 

2.966 

0.155 

0.178 

Copula 

2.088 

0.021 

2.169 

2.967 

0.162 

0.178 

2.967 

0.156 

0.179 

BMA 

2.110 

0.015 

2.250 

2.973 

0.154 

0.182 

2.972 

0.155 

0.183 

Raw ensemble 

2.562 

0.550 

0.773 

3.087 

0.017 

0.187 

3.072 

0.007 

0.189 


Table 1: Mean energy score (ES), reliability index (A) and mean determinant sharpness 
(DS) of probabilistic forecasts, mean Euclidean error (EE) of point forecasts (median/mean), 
empirical correlation (g) and empirical correlation of errors ( g err ) of wind speed and temper¬ 
ature components of point forecasts for the UWME. Empirical correlation of observations 
corresponding to the forecast cases: 0.125. 


cons i ders these ens emble forecasts together, as predictions of a bivariate weather quantity 
flBaran and Mollerl. 120151) . The underdispersive character of the raw ense mble can nicely 
be observed in Figure [H (identical to Figure 1 of Baran and Mollerl ( 20151 )) displaying the 
univariate verification rank histograms of wind speed and temperature forecasts together 
with their joint multivariate rank histogram. The corresponding reliability indices A are 
0.647, 0.842 and 0.550, respectively, and in many cases the raw ensemble either over-, or 
underestimates the verifying observation. Further, the need of bivariate modeling can be 
justified both by the positive correlation of 0.125 of the verifying observations of wind speed 
and temperature for calendar year 2008 taken along all dates and locations, and by the cor¬ 
relations of 0.187 and 0.189 of forecast errors of the ensemble median and mean, respectively. 


4.1.2 Bivariate EMOS calibration 


The first step of EMOS (and BMA) post-processing of ensemble forecast is the selection 
of the length of the rolling training period. In order to ensure comparability of the results 
with the find i ngs o f earl ier studies, we app l y the same 40 days training period length as in 
Moller et alj (12013m and Baran and Moller (120151) . This training period length was a result 
of an exploratory data analysis on a subset of the data set. Similar to the previous studies, 
we produce EMOS predictive PDFs for the whole calendar year 2008, using also the data 
from the last two months of calendar year 2007. After removing dates with missing data 
this means 291 calendar days with a total of 24302 forecast cases. As the eight ensemble 
members of the UWME are not exchangeable, for calibration we apply bivariate EMOS 
model (13.2p with M — 8. 

In case of the copula method, the data from calendar year 2007 are applied for estimating 
the correlation between the two weather quantities, and the resulting correlation matrix is 
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EMOS Rank Histogram 


Indep. EMOS Rank Histogram 



Rank of Observation in Ensemble 


Rank of Observation in Ensemble 


Copula Rank Histogram 


BMA Rank Histogram 



Rank of Observation in Ensemble Rank of Observation in Ensemble 


Figure 2: Multivariate rank histograms for EMOS (upper left), independent EMOS (upper 
right), Gaussian copula (lower left) and BMA (lower right) post-processed UWME forecasts 
of maximum wind speed and minimum temperature. 


then carried forward into th e ana lysis of the 20 08 da ta. This is in accordance with the BMA 
based copula calibration of iBaran and Moller (120151) . 

In Table [I] the verification scores calculated using the EMOS model (13.21) . the inde¬ 
pende nt EMOS models of wind speed and temperature, the copula model of Moller et al. 


( 20131 ) with EMOS post-processed margins, the BMA model of Bar an and Moller ( 20151 ) and 
the raw ensemble are given. Compared to the raw ensemble all post-processing techniques 
substantially improve the calibration of probabilistic forecasts which is quantified by the 
significant decrease of the mean energy score (ES) and reliability index (A) and can also be 
observed in Figure [2] showing the rank histograms of post-processed forecasts. These almost 
uniform histograms should be compared to the rank histograms of the raw ensemble plotted 
in Figure [TJ The price to pay for the better calibration is the loss in sharpness (see the 
corresponding values of DS), however, this is a direct consequence of the small dispersion of 
the raw ensemble (see again Figured]). Post-processing also results in slightly smaller mean 
Euclidean errors (EE) indicating more accurate median and mean forecasts. Further, the 
empirical correlations g of the wind and temperature components of the post-processed point 
forecasts are much closer to the correlation of 0.125 of the verifying observations than the 
corresponding correlations of the ensemble median and mean. This latter is a weakness of 
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Verification Rank Histogram, Wind Speed Verification Rank Histogram, Temperature Multivariate Rank Histogram 



Rank of Observation in Ensemble Rank of Observation in Ensemble Rank of Observation in Ensemble 


Figure 3: Verification rank histograms of the 11-member ALADIN-HUNEPS ensemble fore¬ 
casts of wind speed (left) and temperature ( center ) and the multivariate rank histogram 
(right). Period: April 1, 2012 - March 31, 2013. 


the raw ensemble, however, one should also remark that all error correlations g err (including 
the raw ensemble) are very similar to each other (around 0.180). 

Comparing the different post-processing techniques one can observe that the main dif¬ 
ference between the various approaches appears in the reliability index. The bivariate BMA 
model results in the smallest A value, followed by the copula and the bivariate EMOS 
methods, which is in line with shapes of the multivariate rank histograms plotted in Figure 
[2j Further, the large A value and the U-shaped rank histogram of the independent EMOS 
approach supports the idea of bivariate modeling. However, in the model choice one should 
also take into account that the copula method requires additional data for estimating the 
correlation matrix, whereas in the BMA and EMOS approaches the parameters are estimated 
using only the training data. Finally, in case of the latter two methods the computational 
costs (see Section I4.3j) might also have an influence on the decision. 


4.2 ALADIN-HUNEPS Ensemble 

4.2.1 Raw ensemble 

Wind speed and temperature forecasts of the ALADIN-HUNEPS EPS are better calibrated 
than those of the UWME, however, the rank histograms in Figure E] still exhibit a strong 
underdispersive character. The bivariate reliability index equals 0.317, whereas the reliabil¬ 
ity indices of wind speed and temperature are 0.322 and 0.455, respectively. The need of 
bivariate post-processing is again supported by the forecast error correlations of 0.119 and 
0.123 of the ensemble median and mean, respectively, however, in this case the verifying ob¬ 
servations of wind speed and temperature show a very slight negative correlation of —0.029. 
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Probabilistic forecasts Median forecasts Mean forecasts 



ES 

A 

DS 

EE 

Q 

Qerr 

EE 

Q 

Qerr 

EMOS 

1.442 

0.034 

1.478 

2.015 

-0.041 

0.132 

2.016 

-0.049 

0.132 

Indep. EMOS 

1.436 

0.051 

1.456 

2.002 

-0.033 

0.128 

2.002 

-0.044 

0.127 

Copula 

1.384 

0.075 

1.557 

2.000 

-0.036 

0.128 

2.000 

-0.039 

0.127 

BMA 

1.434 

0.031 

1.539 

2.004 

-0.032 

0.129 

2.007 

-0.041 

0.129 

Raw ensemble 

1.623 

0.327 

0.935 

2.102 

-0.068 

0.122 

2.083 

-0.060 

0.124 


Table 2: Mean energy score (ES), reliability index (A) and mean determinant sharpness (DS) 
of probabilistic forecasts, mean Euclidean error (EE) of point forecasts (median/mean), em¬ 
pirical correlation (g) and empirical correlation of errors ( g err ) of wind speed and temperature 
components of point forecasts for the ALADIN-HUNEPS ensemble. Empirical correlation 
of observations corresponding to the forecast cases: —0.033. 


This latter difference compared to the UWME, where this correlation equals 0.125, might 
be explained by the different types of wind and temperature quantities being examined (see 
Sections 12.11 and 12.211 . 


4.2.2 Bivariate EMOS calibration 


Similar to the case of the UWME, to ensure the comparability of the results with the 
bi variate BMA post- p rocess ing of the same forecast data, we keep the 40-day training period 
of Bar an and Mdller (12015 ). This particular training period length was the outcome of a 
preliminary data analysis consisting of univariate BMA and EMOS calibration of wind speed 
and temperature forecasts. Hence, ensemble forecasts, validating observations and predictive 
distributions are available for the period from May 12, 2012 to March 31, 2013, which means 
318 days and 3 180 forecast cases as 6 days with missing forecasts are excluded from the 
analysis. 


Further, the way the ALADIN-HUNEPS ensemble is generated (see Section ¥171$ induces 
a natural grouping of the ensemble members into two groups. The first group contains just 
the control member f c , whereas in the second are the 10 statistically indistinguishable 
ensemble members f pl ,...,f pl0 , initialized from randomly perturbed initial conditions. 
This results in the predictive PDF 


10 


MHa + BJ c + B r Y, f r ,e, C + DSD t , 


(4.1) 


1=1 


Bar an et al. (2013) 


which is a special case of model ( 13.31) . One should remark here that in 
a different grouping is also suggested (and later investigated in Bar an (2014): Bar an et al. 
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EMOS Rank Histogram 


Indep. EMOS Rank Histogram 


O O 



Rank of Observation in Ensemble 


Rank of Observation in Ensemble 


Copula Rank Histogram 


BMA Rank Histogram 


O O 



Rank of Observation in Ensemble 


Rank of Observation in Ensemble 


Figure 4: Multivariate rank histograms for EMOS (upper left), independent EMOS ( up¬ 
per right), Gaussian copula (lower left) and BMA (lower right) post-processed ALADIN- 
HUNEPS forecasts of instantaneous wind speed and temperature. 


1 20141) and Baran and Moller ( 20151) . too), where the odd and even numbered exchangeable 
ensemble members form two separate groups. This idea is justified by the method their 
initial conditions are generated, since only five perturbations are calculated and then they 
are added to (odd numbered members) and subtracted from (even numbered members) the 
unperturbed initial conditions. However, since in the present study the results corresponding 
to the two- and three-group models are rather similar, only the two-group case is reported. 

In line with the similar case study of Baran and Moller ( 20151 ). to estimate the correlation 
matrix of the Gaussian copula, additional data of the period from October 1, 2010 to March 
25, 2011 are utilized, and the estimated correlation matrix is employed for combining the 
univariate EMOS marginals for 2012/2013 in the Gaussian copula. 


The effects of statistical calibration of ensemble forecasts are quantified by the multivari¬ 
ate scores given in Table [21 Compared to the raw ensemble all four post-processing methods 
result in significantly lower energy scores and reliability indices (compare Figures [3] and [4]) 
and higher DS values. Again, the loss in determinant sharpness is an effect of the underdis- 
persive nature of the ensemble. However, here the increase in DS is around 60 %, whereas 
for the UWME the raw ensemble is almost three times sharper than the various predic¬ 
tive PDFs. This again indicates the better calibration of the ALADIN-HUNEPS ensemble 
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which is fully consistent with Figures Q] and [3] and the corresponding reliability indices given 
in Sections 14.1.11 and 14.2.11 respectively. Further, the ensemble median and mean vectors 
produce slightly larger Euclidean errors than the corresponding post-processed point fore¬ 
casts. Moreover, the empirical correlations of the components of the ensemble median and 
mean are almost the double of the nominal correlation —0.033 of observations, whereas the 
correlations of wind speed and temperature components of the BMA and EMOS point fore¬ 
casts are close to this value. Finally, both the ensemble median/mean and their calibrated 
counterparts exhibit almost the same forecast error correlations. 


From the competing post-processing methods the Gaussian copula approach results in 
the lowest energy score and Euclidean errors, however, the differences compared to the 
corresponding scores of the BMA and EMOS models (especially in the EE values) are rather 
small. Reliability indices show far larger variability and the highest scores belong to the 
copula model and to the independent EMOS approach. The A values in Table [2] are in 
accordance with the rank histograms in Figure SJ the rank his togra m of the copula method 
is strongly hump-shaped indicating over-dispersion (see, e.g., Gneiting et a/J, 2008), whereas 
the histogram of the independent EMOS approach exhibits some under-dispersion. For the 
ALADIN-HUNEPS ensemble the bivariate BMA model has the best overall performance 
closely followed by the bivariate EMOS method, however, similar to the case of the UWME 
the computational costs might also effect the model choice. 


4.3 Computational aspects 


For all EMOS methods which have been developed so far the most time-consuming and 
problematic part of ensemble post-processing is the numerical optimization used in parameter 
estimation. In case of bivariate EMOS calibration of the ALADIN-HUNEPS ensemble only 
the robust Nelder-Mead algorithm occurs to be reliable, as one has to estimate 18 free 
parameters with the help of 400 forecast cases of the training data. For the UWME the 
data/parameter ratio is much better, as 42 free parameters have to be estimated using on 
average 3354 forecast cases. For this data set the reported Nelder-Mead and the faster BFGS 
algorithm give almost the same results. 


In case of the BMA calibration the bottleneck with respect to the computation costs 
is the EM algorithm applied for ML estimation of the parameters. The bivariate BMA 
model of iBaran and Mollerl (120151 1 makes use of a mod i ficatio n of the truncated data EM 
algorithm for Gaussian mixture models (Lee and Scotti . 120121 ) which operates with closed 
formulae and there is no need of numerical optimization. However, due to the large number 
of free parameters (UWME: 59; ALADIN-HUNEPS: 17) it requires quite a lot of iterations 
resulting in long computation times. 


The Gaussian copula method starts with very fast univariate EMOS calibration, how¬ 
ever, it utilizes an additional data set for estimating the correlation matrix of the Gaussian 
copula and additional post-processing of the univariate predictive PDFs. Hence, in terms of 
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UWME 


ALADIN-HUNEPS 




Computation time in seconds 


Computation time in seconds 


(a) 


(b) 


Figure 5: Densities of computation times for the bivariate BMA and EMOS models, a) 
UWME for the calendar year 2008; b) ALADIN-HUNEPS ensemble for the period May 12, 
2012 - March 31, 2013. 


computational efficiency this method is not comparable with the bivariate approaches and 
it is excluded from our analysis. 

Figures [5]r and (5b show the kernel density estimates of the distribution of computation 
times over the days in the verification period for bivariate BMA and EMOS models (imple¬ 
mented in R) for the UWME and ALADIN-HUNEPS ensemble, respectively, calculated on a 
portable computer under a 64 bit Fedora 20 operating system (Intel Quad Core i7-4700MQ 
CPU (2.40GHz x 4), 20 Gb RAM). We remark that in Figured the density of computation 
times of the EMOS model with BFGS optimization is also plotted. The densities displayed 
in Figure [5] clearly show that in terms of computation time the EMOS model outperforms 
the BMA approach. The same conclusion can be derived from Table E] where the median, 
mean and standard deviation of the computation times are given. However, one should also 
remark that these computation times are still too long for an operational use. 


5 Conclusions 

We introduce a new EMOS model for joint calibration of ensemble forecasts of wind speed 
and temperature providing a predictive PDF which follows a bivariate normal distribution 
truncated from below at zero in its first coordinate. The model is tested on wind speed and 
temperature forecasts of the eight-member University of Washington mesoscale ensemble and 
of the eleven-member ALADIN-HUNEPS ensemble of the Hungarian Meteorological Service. 
These ensemble prediction systems differ both in the weather quantities being forecasted and 
in the generation of the ensemble members. 
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UWME ALADIN-HUNEPS 

Model EMOS BMA EMOS BMA 

Nelder-Mead BFGS Nelder-Mead 

median 3349.443 1140.702 4419.288 131.609 436.873 

mean 3475.681 1228.142 4801.247 150.008 459.560 

std. dev. 1177.651 343.633 1823.083 69.678 345.187 

Table 3: Median, mean and standard deviation of the computation times in seconds allocated 
to the parameter estimation for individual days in the verification period (UWME: calendar 
year 2008, 24302 forecast cases; ALADIN-HUNEPS ensemble: May 12, 2012 - March 31, 
2013, 3180 forecast cases). 


Using appropriate verification measures (energy score, reliability index and determinant 
sharpness of probabilistic and Euclidean errors, correlations, as well as correlations of er¬ 
rors of median/mean forecasts) the predictive performance of the bivariate EMOS model 
is compared to the forecast skills of the ind epe n dent EMOS c alibration of wind speed and 

(2013) based on univariate EMOS 


temperature, the Gaussian copula method of Mol 


models, the bivariate BMA model suggested by 
semblc vectors as well. 


er et al 


Bar an and Moller (20151) and the raw en- 


From the results of the presented case studies one can conclude that compared to the 
raw ensemble post-processing always improves the calibration of probabilistic and accuracy 
of point forecasts. Further, in terms of predictive performance the bivariate EMOS model 
is able to keep up with the other two bivariate methods. Concerning the computational 
costs it outperforms the bivariate BMA method, whereas compared to the Gaussian copula 
approach it does not require an additional data set for estimating the correlations. 
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